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Abstract. In this work, we provide non-asymptotic bounds for the average 
speed of convergence of the empirical measure in the law of large numbers, 
in Wasserstein distance. We also consider occupation measures of ergodic 
Markov chains. One motivation is the approximation of a probability measure 
by finitely supported measures (the quantization problem). It is found that 
rates for empirical or occupation measures match or are close to previously 
known optimal quantization rates in several cases. This is notably highlighted 
in the example of infinite-dimensional Gaussian measures. 



1. Introduction 

This paper is concerned with the rate of convergence in Wasserstein distance for 
the so-cahed empirical law of large numbers : let {E, d, /x) denote a measured Pohsh 
space, and let 

1 " 

(1) L,, = -Y,5x. 

1=1 

denote the empirical measure associated with the i.i.d. sample (Xi)i<i<„ of 
law /i, then with probability 1, L„ ^ /i as n — >■ +00 (convergence is understood 
in the sense of the weak topology of measures). This theorem is also known as 
Glivenko-Cantelli theorem and is due in this form to Varadarajan |26| . 

For 1 < p < +00, the p-Wasserstein distance is defined on the set Vp{E)'^ of 
couples of measures with a finite p-th moment by 



WPi^,,,y)= inf dPix,y)Tr{dx,dy) 

where the infimum is taken on the set V{fJ., v) of probability measures with first, 
resp. second, marginal ^, resp. v. This defines a metric on Vp^ and convergence 
in this metric is equivalent to weak convergence plus convergence of the moment of 
order p. These metrics, and more generally the Monge transportation problem from 
which they originate, have played a prominent role in several areas of probability, 
statistics and the analysis of P.D.E.s : for a rich account, see C. Villani's St-Flour 
course \27\ . 

Our purpose is to give bounds on the mean speed of convergence in Wp distance 
for the Glivenko-Cantelli theorem, i.e. bounds for the convergence E(Wp(L„, ^)) — 
0. Such results are desirable notably in view of numerical and statistical applica- 
tions : indeed, the approximation of a given probability measure by a measure 
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with finite support in Wasserstein distance is a topic that appears in various guises 
in the hterature, see for example |15| . The first motivation for this work was to 
extend the resuhs obtained by F. Bolley, A. Guillin and C. Villani [5J in the case of 
variables with support in M''. As in this paper, we aim to produce bounds that are 
non-asymptotic and effective (that is with explicit constants), in order to achieve 
practical relevance. 

We also extend the investigation to the convergence of occupation measure for 
suitably ergodic Markov chains : again, we have practical applications in mind, as 
this allows to use Metropolis- Hastings-type algorithms to approximate an unknown 
measure (see 11.31 for a discussion of this) . 

There are many works in statistics devoted to convergence rates in some metric 
associated with the weak convergence of measures, see e.g. the book of A. Van der 
Vaart and J. Wellner ^25j. Of particular interest for us is R.M. Dudley's article [llj, 
see Remark 1 1.1 1 

Other works have been devoted to convergence of empirical measures in Wasser- 
stein distance, we quote some of them. Horowitz and Karandikar |17j gave a bound 
for the rate of convergence of E[W|(L„,/i)] to for general measures supported 
in R'^ under a moment condition. M. Ajtai, J. Komlos and G. Tusnady [I] and 
M.Talagrand [24J studied the related problem of the average cost of matching two 
i.i.d. samples from the uniform law on the unit cube in dimension d > 2. This 
line of research was pushed further, among others, by V. Dobric and J.E. Yukich 
pDJ or F. Barthe and C. Bordenave \2\ (the reader may refer to this last paper for 
an up-to-date account of the Euclidean matching problem). These papers give a 
sharp result for measures in W^, with an improvement both over [TT] and [5]. In 
the case fi e 7^(M), del Barrio, Gine and Matran [7\ obtain a central limit theorem 
for WiiLn,fi) under the condition that \/F(t){l - F{t))dt < +00 where F 
is the cumulative distribution function (c.d.f.) of fi. In the companion paper [4J, 
we investigate the case of the Wi distance by using the dual expression of the Wi 
transportation cost by Kantorovich and Rubinstein, see therein for more references. 

Before moving on to our results, we make a remark on the scope of this work. 
Generally speaking, the problem of convergence of Wp{Ln, /i) to can be divided 
in two separate questions : 

• the first one is to estimate the mean rate of convergence, that is the con- 
vergence rate of E[Wp(L„, /i)], 

• while the second one is to study the concentration properties of Wp{Ln, fi) 
around its mean, that is to find bounds on the quantities 

F{Wp{Lr,,fi)-E[Wp{L„,fi)]>t). 

Our main concern here is the first point. The second one can be dealt with 
by techniques of measure concentration. We will elaborate on this in the case 
of Gaussian measures (see Appendix R)) . but not in general. However, this is a 
well-trodden topic, and some results are gathered in [3]. 

Acknowledgements. We thank Patrick Cattiaux for his advice and careful reading 
of preliminary versions, and Charles Bordenave for introducing us to his work [2] 
and connected works. 

1.1. Main result and first consequences. 
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Definition 1.1. For X C E, the covering number of order 6 for X, denoted by 
N{X, (5), is defined as the minimal n G N such that there exist xi, . . . ,Xn in X with 

n 

Xc \Jb{x,,6). 

i=i 

Our main statement is summed up in the following proposition. 

Proposition 1.1. Choose t > 0. Let fi E 'P{E) with support included in X C E 
with finite diameter d such that N{X,t) < +oo. We have the bound : 

E(Wp(L„,/i)) < c (t + n-^^^P f ^ N{X,d)^/^Pds] . 



with c < 64/3. 

Remark. Proposition II. II is related in spirit and proof to the results of R.M. Dudley 
[H] in the case of the bounded Lipschitz metric 



dBL{tJ',v)= inf fd{fi-iy). 

/l-Lip,|/|<l J 

The analogy is not at all fortuitous : indeed, the bounded Lipschitz metric is 
linked to the 1-Wasserstein distance via the well-known Kantorovich-Rubinstein 
dual definition of Wi : 



inf. fd{^i-u). 

/l-Lip J 

The analogy stops sX p — 1 since there is no representation of Wp as an empirical 
process for p > 1 (there is, however, a general dual expression of the transport cost). 
In spite of this, the technique of proof in [llj proves useful in our case, and the 
technique of using a sequence of coarser and coarser partitions is at the heart of 
many later results, notably in the literature concerned with the problem of matching 
two independent samples in Euclidean space, see e.g. [24] or the recent paper [2J. 

We now give a first example of application, under an assumption that the un- 
derlying metric space is of finite-dimensional type in some sense. More precisely, 
we assume that there exist fc^; > 0, a > such that 



(2) N{E, 6) < A:i=;(Diam E/S)°'. 

Here, the parameter a plays the role of a dimension. 

Corollary 1.2. Assume that E satisfies and that a > 2p. With notations as 
earlier, the following holds : 

E[Wp{L,„n)] < c ^ Diam Ek]^°'n-^/°' 

with c < 64/3. 

Remark. In the case of measures supported in R'^, this result is neither new nor 
fully optimal. For a sharp statement in this case, the reader may refer to [2] 
and references therein. However, we recover at least the exponent of n~^/'^ which 
is sharp for d > 3, see [2] for a discussion. And on the other hand. Corollary 
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11.21 extends to more general metric spaces of finite-dimensional type, for example 
manifolds. 

As opposed to Corollary 11.21 our next result is set in an infinite-dimensional 
framework. 

1.2. An application to Gaussian r.v.s in Banach spaces. We apply the results 
above to the case where E is a, separable Banach space with norm ||.||, and /i is a 
centered Gaussian random variable with values in E, meaning that the image of fi 
by every continuous linear functional / e _£* is a centered Gaussian variable in R. 
The couple (i?, fi) is called a (separable) Gaussian Banach space. 

Let X be a i^-valued r.v. with law /.t, and define the weak variance of /x as 

a= sup {Ef{X)y^\ 
/e£;M/|<i 

The small ball function of a Gaussian Banach space (E, fj.) is the function 



V^i) --logAi(S(0,i))- 

We can associate to the couple {E, /i) their Cameron-Martin Hilbert space H C 
E, see e.g. [TH] for a reference. It is known that the small ball function has 
deep links with the covering numbers of the unit ball of H, see e.g. Kuelbs-Li 
|18j and Li-Linde |21| . as well as with the approximation of n by measures with 
finite support in Wasserstein distance (the quantization or optimal quantization 
problem), see Fehringer's Ph.D. thesis [12], Dereich-Fehringer-Matoussi-Scheutzow 
[8], Graf-Luschgy-Pages QJj. 

We make the following assumptions on the small ball function : 

(1) there exists k > 1 such that ip{t) < Kil){2t) for < t < ip, 

(2) for aU e > 0, n-^ = o(?/'"^(logn)). 

Assumption ^ implies that the Gaussian measure is genuinely infinite dimen- 
sional : indeed, in the case when dim K < +oo, the measure is supported in a 
finite-dimensional Banach space, and in this case the small ball function behaves 
as logt. 

Theorem 1.3. Let (E^fi) be a Gaussian Banach space with weak variance a and 
small ball function Tp. Assume that Assumptions (QP and 0) hold. 
Then there exists a universal constant c such that for all 

n > (6-|-K)(log2V V(l) V V'(io/2) V 1/cr^), 
the following holds : 



(3) E{W2{Ln,tJ-))<C 

In particular, there is a C ~ C'{fi) such that 



V-i(^— logn)-l-c7n-i/W6+-)] 

6 + K 



(4) E{W2iLn,fi))<Ci;-\\ogn). 
Moreover, for A > 0, 
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(5) W2{Ln,ii) < (C + A)'0 ^(^ogn) with probability 1 — cxp—mp ^ {log n) -^-^ . 

Remark. Note that the choice of 6 + k is not particularly sharp and may likely be 
improved. 

In order to underline the interest of the result above, we introduce some defini- 
tions from optimal quantization. For n > 1 and 1 < r < +oo, define the optimal 
quantization error at rate n as 

'5«,r(/^) = inf Writi, v) 

where the infimum runs on the set Vn of probability measures with finite support 
of cardinal bounded by n. Under some natural assumptions, the upper bound of 
([5]) is matched by a lower bound for the quantization error. Theorem 3.1 in [8] 
states the following : if for every < C < 1, 

/Lt((l - QeB) = o{n{eBj) as e ^ 0, 

then 

(where a„ > 6„ means liminf a„/6„ > !)• 

In the terminology of quantization. Theorem 11.31 states that the empirical mea- 
sure is a rate-optimal quantizer with high probability (under some assumptions on 
the small ball function). This is of practical interest, since obtaining the empirical 
measure is only as difficult as simulating an instance of the Gaussian vector, and 
one avoids dealing with computation of appropriate weights in the approximating 
discrete measure. 

We leave aside the question of determining the sharp asymptotics for the average 
error E{W2{Ln, /i)), that is of finding c such that E(W2(L„, /i)) ^ cip~^{\ogn). Let 
us underline that the corresponding question for quantizers is tackled for example 
in [22]. 

1.3. The case of Markov chains. We wish to extend the control of the speed of 
convergence to weakly dependent sequences, such as rapidly-mixing Markov chains. 
There is a natural incentive to consider this question : there are cases when one 
does not know hom to sample from a given measure tt, but a Markov chain with 
stationary measure tt is nevertheless available for simulation. This is the basic set- 
up of the Markov Chain Monte Carlo framework, and a very frequent situation, 
even in finite dimension. 

When looking at the proof of Proposition 11.11 it is apparent that the main 
ingredient missing in the dependent case is the argument following ()18p . i.e. that 
whenever A C X is measurable, nLn{A) follows a binomial law with parameters n 
and /i(A), and this must be remedied in some way. It is natural to look for some type 
of quantitative ergodicity property of the chain, expressing almost-independence of 
Xi and Xj in the long range (|i — j| large). 

We will consider decay-of-variance inequalities of the following form : 



(6) 



Var^F"/ < CA"Var,/. 



6 



EMMANUEL BOISSARD AND THIBAUT LE GOUIC 



In the reversible bound of the type of ([6]) is ensured by Poincare or spectral 

gap inequalities. We recall one possible definition in the discrete-time Markov chain 
setting. 

Definition 1.2. Let P be a Markov kernel with reversible measure tt G V{E). We 
say that a Poincare inequality with constant Cp > holds if 



(7) yar^f <Cp I f{I~P^)fdTr 

for all / e L^iir). 
If ([7]) holds, we have 



Var^P"/ < A"Var,/ 

with \^{Cp- 1)/Cp. 

More generally, one may assume that we have a control of the decay of the 
variance in the following form : 



(8) Var,P"/< CA"||/-y /dTTlliP. 

As soon as p > 2, these inequalities are weaker than Our proof would be 
easily adaptable to this weaker decay-of-variance setting. We do not provide a 
complete statement of this claim. 

For a discussion of the links between Poincare inequality and other notions of 
weak dependence (e.g. mixing coefhcients) , see the recent paper [B]. 

For the next two theorems, we make the following dimension assumption on E : 
there exists fc^ > and a > such that for all X C E with finite diameter. 



(9) N{X,S) < fc£;(Diam X/S)". 

The following theorem is the analogue of Corollary 11.21 under the assumption 
that the Markov chain satisfies a decay-of-variance inequality. 

Theorem 1.4. Assume that E has finite diameter d > and (0) holds. Let 

TT e V{E), and let {Xi)i>Q he a E-valued Markov chain with initial law v such that 
TT is its unique invariant probability. Assume also that (0) holds for some C > 
and A < 1. 

Then if 2p > a{l + 1/r) and Ln denotes the occupation measure ^/n-Y^^^^Sxi, 
the following holds : 

for some universal constant c < 64/3. 

The previous theorem has the drawback of assuming that the state space has 
finite diameter. This can be circumvented, for example by truncation arguments. 
Our next theorem is an extension to the unbounded case under some moment 
conditions on tt. The statement and the proof involve more technicalities than 
Theorem 11.41 so we separate the two in spite of the obvious similarities. 
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Theorem 1.5. Assume that (0) holds. Let n E 'P{E), and let {Xi)i>o be a E- 
valued Markov chain with initial law v such that tt is its unique invariant probability. 
Assume also that (0j holds for some C > and A < 1. Let xq € E and for all 6 > 1, 
denote Mg = Jd{xo,xydn. Fix r and > 1 and assume 2p > a{l + l/r){l + l/Q. 

There exist two numerical constant Ci(p, r, C) and C2{p,r,C.) only depending on 
p, r and Q such that whenever 



{l-\)n 

the following holds : 



<Ci(p,r,C), 



n\\du\\ N i/Wi+iA)(i+i/C)] 



where 

' ni'^ ml+^'^ a(l + l/r) ^ 

2. Proofs in the independent case 

Lemma 2.1. Let X d E, s > and u,v E N with u < v. Suppose that 
N{X,4:^^s) < +00. For u < j < v, there exist integers 



(10) m{j)<N{X,A^^s) 

and non-empty subsets Xj^i of X , u<j<v,l<l< m{j), such that the sets 
Xj_i 1 < I < TTT-d) satisfy 

(1) for each j, {Xjj)i<i<„^(^j) is a partition of X , 

(2) Diam Xjj < A-^+'^s, 

(3) for each j > u, for each 1 < / < rn{j) there exists 1 < I' < m{j — 1) such 
that Xj,i C Xj^i,i' . 

In other words, the sets Xj^i form a sequence of partitions of X that get coarser 
as j decreases (tiles at the scale j — 1 are unions of tiles at the scale j). 

Proof. We begin by picking a set of balls Bj^i — B{xj^i,4~^ s) with u < j < v and 
1<1< N{X, 4~^s), such that for all j, 

N(XA~'s) 

1=1 

Define X^^i = B^ i, and successively set Xy^i = By i \ Xy i_i. Discard the 
possible empty sets and relabel the existing sets accordingly. We have obtained the 
finest partition, obviously satisfying conditions (HJ-®. 

Assume now that the sets Xjj have been built for fc + 1 < j < ti. Set Xk^i 
to be the reunion of all Xk+ij' such that Xk+i.u H Bk.i 7^ 0. Likewise, define by 
induction on I the set Xk,i as the reunion of all Xk+i.u such that Xk+i,i' Bk.i 7^ 
and Xk+i,i' ^ Xk,p for 1 < p < /. Again, discard the possible empty sets and relabel 
the remaining tiles. It is readily checked that the sets obtained satisfy assumptions 
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(HI and dHI). We check assumption ([2]) : let Xk,i denote the center of Bk,i and let 
y e Xk+i,v C Xud- We have 

d{xk,i , y) < 4"'=s + Diam Xk+i,v < 2 x 
thus Diam Xk^i < as desired. 

□ 

Consider as above a subset X oi E with finite diameter d, and assume that 
N{X,A~'^d) < +00. Pick a sequence of partitions (Xj_;)i<;<,„Q) for 1 < j < fc, as 
per Lemma [2. II For each choose a point xjj G Xj^. Define the set of points 
of level j as the set L{j) — {a;j.;}i<;<„i(j). Say that Xji^i> is an ancestor of Xj^i if 
Xjj C Xj'j' : we will denote this relation by {j' , I') — >■ (j, I). 

The next two lemmas study the cost of transporting a finite measure ruk to 
another measure Uk when these measures have support in L{k). The underlying 
idea is that we consider the finite metric space formed by the points Xjj, 1 < J < fc, 
as a metric tree, where points are connected to their ancestor at the previous level, 
and we consider the problem of transportation between two masses at the leaves of 
the tree. The transportation algorithm we consider consists in allocating as much 
mass as possible at each point, then moving the remaining mass up one level in the 
tree, and iterating the procedure. 

A technical warning : please note that the transportation cost is usually defined 
between two probability measures ; however there is no difficulty in extending its 
definition to the transportation between two finite measures of equal total mass, 
and we will freely use this fact in the sequel. 

Lemma 2.2. Let rrij, nj be measures with support in Lj. Define the measures 
rfij-i and nj-i on Lj-i by setting 

(11) rhj-i{xj-iM) = (™j(a^i.O-"j(a;j,/))AO, 

(12) nj_i{xj_ij>) = ^ {nj{xj^i) - mj{xj^i)) AO. 

The measures anduj^i have same mass, so the transportation cost between 

them may be defined. Moreover, the following bound holds : 

(13) Wp{mj,nj) < 2 X A^^+'^d\\mj - n^W^,^ + Wp(mj_i, n^^i). 
Proof. Set ruj A nj{xj^i) = mj(xj.i) A nj{xj^i). By the triangle inequality, 

Wp{m, n) <Wp{mj,mj A nj) + rhj^i + Wp{mj A nj + rhj-i, nij A nj + fij-i) 

+ Wp{mj A Uj + hj^i,nj). 
We bound the term on the left. Introduce the transport plan Tr^ defined by 

Trm{x.jj,Xj^i) = ruj A nj{xjj), 
iTra{.Xj^uXj-i,v) = {mj{xj^i)-nj(xjj))+ when (j - 1,^') {j,l). 
The reader can check that tt™ G V{mj, mj A Uj + ifij-i). Moreover, 
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_ 1/p 

Wp{mj,m~^i) < ( / (P{x,y)iTm(dx,dy) 



Likewise, 



1=1 



1=1 



As for the term in the middle, it is bounded by VFp(TOj_i, Putting this 

together and using the inequahty x + y < 2^~^/P(a;^ + y^y^P,, we get 



1=1 



□ 



Lemma 2.3. Lef rrij, rij be measures with support in Lj. Define for 1 < j' < j 
the measures rrij, Uj with support in L'^ by 

(14) mj,{xj'M)= = Y ^ji^i-i)- 

The following bound holds : 



(15) Wp{m^,nj) < £ 2 X A'^'+'^dWrn'^ 



Proof. We proceed by induction on j. For j — 1, the result is obtained by using 
the simple bound Wp(rni,ni) < d\\mi — 7ii||;^'^. 

Suppose that ([T5|) holds for measures with support in ij-i. By lemma [^21 we 
have 

Wp{mj,nj) < 2 X ^-^+^d\\mj - nj\\]l^ + Wp(mj_i, n^-i) 

where rhj^i and rij-i are defined by ([TT|) and (|12p respectively. For 1 < « < j — 1, 
define following ([T^ 

rhi{xi^i-) = Y, mj-i(xj^i,i), Uiix^j') ^ ^ nj_i(a::j_i,;). 
We have 



Wp(mj 



< 2 X 4" 



-J+2 



d\\mj 



i/p 

TV ■ 



J-1 

E 



2x4 



-/+2 



(i||mi 



i/p 

TV 
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To conclude, it suffices to check that for 1 < i < 1, ||?Tii— ni||T\/ = Hwi— ?^i||Ty- 

□ 



Proof of Proposition We pick some positive integer k whose value wih be de- 
termined at a later point. Introduce the sequence of partitions (^j,;)i<;<m(j) for 
< J < fc as in the lemmas above, as well as the points Xj^i. Define fik as the 
measure with support in L{k) such that fJ,kixk,i) = iJ,{Xk.i) for 1 < Z < m{k). The 
diameter of the sets Xt.i is bounded by 4~^'^^d, therefore VFp(/i,/ife) < 4^*^+^^. 
Let denote the empirical measure associated to /i^. 

For < j < — 1, define as in Lemma l2.3l the measures fij and with support 
in L{j) by 



(16) f^jixj,i') = l^k{xk,i) 

(j.y)^(k,i) 

(17) Kix,,,)^ J2 Ln(.Xk,i). 

ij.i')^ik,i) 

It is simple to check that iij{xjj) — ii{Xj,i), and that L^^ is the empirical measure 
associated with fij. Applying psp . we get 



(18) W-p(Mfe, ifj < E 2 X 4-^+'rf||Ai, - LiW^^ 

j=i 



Observe that nLl^{xj^i) is a binomial law with parameters n and ii{Xj^i). The 
expectation of — U^\\tv is bounded as follows : 



1=1 



< 1/2^2 ^Ei\iLi-^,){x,,W) 
1=1 

1=1 



<1/2.M^') 



In the last inequality, we use Cauchy-Schwarz's inequality and the fact that 
(Xj_;)i<;<,„(j) is a partition of X. Putting this back in ((TS]), we get 
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k 

/2p 



k 

I'd/A 

J4-(fc + i)d 

In the last line, we use a standard sum-integral comparison argument. 
By the triangle inequality, we have 

Wp{fi,L,,) < Wj,{fi,f,k) + Wj,{fik,Lt) + Wj,{Lt,Ln). 
We claim that E(Wp(Lj^, L„)) < Wp{^, fik)- Indeed, choose n i.i.d. couples 
{Xi,Xf^) such that Xi ^ /i, Xf ^ /ifc, and the joint law of {Xi,X^) achieves an 
optimal coupling, i.e. E|Xj — X^\p = WP{fj,, fi''). We have the identities in law 



_^ n 1 ^ 

i=l j=l 

Choose the transport plan that send s X^ to X^ : this gives the upper bound 



WP{L„,L';,)<l/nY,\X.~X^\P 

1=1 

and passing to expectation proves our claim. 

Thus, E(M/p(^, L„)) < 2Wp{fi, ^ik)+'KiWp{nk,L'^))- Choose now fc as the largest 
integer such that 4~'^'^+^^(i > t. This imposes 4:~''~^^d < 16t, and this finishes the 
proof. 

□ 

Proof of Corollary It suffices to use Proposition 11.11 along with ([2]) and to op- 
timize in t. □ 

3. Proof of Theorem 11.31 

Proof of Theorem \1.3[ We begin by noticing that statement ([5]) is a simple conse- 
quence of statement ([4]) and the tensorization of T2 : we have by Corollary I A. 2 1 

P{W2{Ln,fi) > E{W2{L^,ti)+t) < e-"*V(2-^)^ 
and it suffices to choose t = Xtp~^{logn) to conclude. We now turn to the other 
claims. 

Denote by K the unit ball of the Cameron-Martin space associated to E and fj,, 
and by B the unit ball of E. According to the Gaussian isoperimetric inequality 
(see [H]), for aU A > and e > 0, 



fi{XK + eB)><i>{X + ^-\fi{eB))) 

where = e"""^ ^^du/^/2Tr is the Gaussian c.d.f.. 
Choose A > and e > 0, and set X = XK + sB. Note 



12 



EMMANUEL BOISSARD AND THIBAUT LE GOUIC 



, 1 

the restriction of fi to the enlarged bah. 

The diameter of X is bounded by 2{aX + e). The W2 distance between _L„ and 
fi is thus bounded as fohows : 

(19) W2{L„,^l) <2W2ip,^l')+ct + cn-^^^ J N{X,Sy^U5 
Set 



(20) h = W2{^i,^l') 

(21) I2 = t 

(22) 13 = n~^/* / 7V(X,,5)i/*d(5. 



To begin with, set e = t/2. 

Controlling Ii . We use transportation inequaUties and the Gaussian isoperimet- 
ric inequahty. By Lemma lA. 11 /i satisfies a T2(2(T^) inequahty, so that we have 



W2{fi,fi') < V2fT2i/(^V) - v/-2'T2 1og/z(Aif + e5) 



< v/-2tT2 log$(A + $-i(^(£B))) 
= \/2CTy'-log$(A + $-i(e-V'(*/2))). 
Introduce the tail function of the Gaussian distribution 



T(j;) = V2^ / e-y^''^dy. 

J X 



We will use the fact that + T^^ = 0, which comes from symmetry of the 
Gaussian distribution. We will also use the bound T{t) < e^*' /2, t > and its 
consequence 

T-\u) < V-21ogw, < u < 1/2. 

We have 

^-i(g-VKt/2)) = _T-i(g-^(t/2)) > „^2^(t/2) 

as soon as ip{t/2) > log 2. The elementary bound log yzj < 2a; for x < 1/2 
yields 

/ 1 \ 1/2 
v/-21og$(ii) - V2 log 



1-T(^)^ 

< V2e-"'/4 

whenever m > T^^(l/2) — 0. Putting this together, we have 
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(23) h < V2ae-(^-V2V'(*/2))V4_ 

whenever 



(24) V(i/2) > log 2 and A - V2V'(i/2) > 0. 

Controlling I3. The term I3 is bounded by l/2n~i/4(crA + t/2)N{X,t)'^^'^ (just 
bound the function inside by its value at t, which is minimal). Denote k — N{XK, t— 
e) the covering number of \K (w.r.t. the norm of E). Let xi, . . . ,Xk & K he such 
that union of the balls B{xi,t — e) contains XK. From the triangle inequality we 
get the inclusion 

k 

XK + eBc [JB{x^,t). 

1=1 

Therefore, N{X,t) < N{XK,t- e) = N{XK,t/2). 

We now use the well-known link between N (XK ,t / 2) and the small ball function. 
Lemma 1 in [TH] gives the bound 

N{XK,t/2) < e^V2+'>(t/4) < gAV2+K>>(t/2)^ 

so that 

(25) h < i(<TA + t/2)e^+fV'(t/2)-iiogn^ 

Remark that we have used the doubling condition on ^, so that we require 

(26) t/4 < to. 

Final step. Set now t — 2?/;^^ (a log n) and A = 2^/2a log n, with a > yet 
undetermined. Using ()23p and (|25p , we see that there exists a universal constant c 
such that 

E(iy2(^«,M)) <c [V"^(alogrj) +(Te-('^/^)'°s" 

+ (av/al^ + i/;-i(alogn))e['^(i+''/4)-i/41 

Choose a = 1/(6 + k) and assume logn > (6 + K)(log2 V V(l) V 'ip{to/2)). 
This guarantees that the technical conditions and are enforced, and that 
?/'~^(alogn) < 1. Summing up, we get : 



^-Htt^ logn) + (1 + '^1/77^ log n)n-i/(i2+2.) 



E(VF2(L„,/i)) <c 

Impose logn > (6 + K)/a'^ : this ensures cry' g;^^ logn > 1. And finally, there 
exists some c > such that for all x > 1, ^/\ogxx~^^^ < c : this implies 



Q+K 



.lognn-i/(24+4.) 
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This gives 

(1 + ^ /^logn)n-i/(i2+2«) < can-i/[4(6+K)] 
and the proof is finished. 

□ 

4. Proofs in the dependent case 

We consider hereafter a Markov chain (X„)„gN defined by Xq ^ v and the 
transition kernel P . Let us denote by 



its occupation measure. 



Proposition 4.1. Suppose that the Markov chain satisfies (0j for some C > and 
A < f . Then the following holds : 

( f r du \ ^/^p r'^/'^ \ 

(27) E,{Wp{L^,7T)) <cit+ (^^—^ll — ll^j iV(X,t)i/2p(i+i/'-)d<j . 

Proof. An application of as in yields 

(28) E(W'p(L„,^)) < 2 X 4-^-+id + ^2 X 4-^'+'d ^ E|(L„ - 7r)(X,- , 
Let A be a measurable subset of X, and set fA{x) = ^Aix) — 7r(A). We have 

n 

E|(L„ - ^)(A)| = l/nE,| fA{X^)\ 



< 1/n 



Let p,q,r > 1 be such that 1/p + 1/q + 1/r = 1, and let s be defined by 
1/s — 1/p + 1/q. Now, using Holder's inequality with r and s, 

E. [fAix,)fAix,)] < ||^|U(E^|/^(x,)/^(J^,)r)l/^ 

Use the Markov property and the fact that / i-> Pf is a contraction in to get 

[/a(^,)/a(^,)] < W^WrWfAP'-'fAWs- 

Finally, use Holder's inequality with p, q : we get 
(29) E. [fA{X,)fA{X,)] < 
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Set p = 2 and note that for 1 < i < +oo, we have H/aII* < 27r(y4)^/*. Use 
apphed to the centered function /a to get 



and as a consequence, 



(30) E\{L^-n){A)\ < ^^^|| ^||y2^(A)V2-i/2^ 



Come back to (l28l) : we have 



i/p 



J=l \ /=1 



(1 — X)n dn 



□ 



Proof of Theorem\r^ Use dH]) and (0) to get 



f + ylt-"/2p(l + l/'-) + l 



where 



A 



a{l + 1/?') CtTT 



Optimizing in t finishes the proof. 



We now move to the proof in the unbounded case. 



□ 



Proof of Theorem \1.5\ We remind the reader that the fohowing assumption stands : 
for X <Z E with diameter bounded by d, 

(31) N{X,5)<kE{dl5r. 

In the fohowing hues, we will make use of the elementary inequalities 



(32) (x + yf < 2P-\xP + yP) < 2P-\x + yf . 

Step 1. 

Pick increasing sequence of numbers > to be set later on, and some point 
xo e E. Define Ci = B{xQ,di), and Ci — B{xo,di) B{xo,di-i) for i > 2. 
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The idea is as follows : we decompose the state space E into a union of rings, 
and deal separately with Ci on the one hand, using the case of Theorem 11.41 as 
guideline, and with the union of the Ci, i > 2 on the other hand, where we use 
more brutal bounds. 

We define partial occupation measures 

n 

and their masses rrii — Lli^{E). We have the inequality 



(33) WP(Ln,n) < ^m,WpP(l/m,L^„,^). 

i>i 

On the other hand, 



Wp{l/m,Ll^,n) < ( j d{xo,xfdnf'^ + ( j d(xo, x)Pd(l/m,L;))i/f 

so that W^{l/m,L\,iT) < 2^-^ {Mp + df) using Also, using §^ and §^ 

yields 

i/p 

I/Pt 



Wp(i„, vr) < mY^Wpil/miLl tt) + 2^-^/p | ^ m, [Mp + 
Pass to expectations to get 



, i>2 



i/p 

(34) E[iyp(i„,7r)] < E [m\/PWp{l/miLl7r)\ +2'-'/p | ^7r(a) [Mp + d^] 

\i>2 

We bound separately the left and right term in the right-hand side of (p4| . 
starting with the right one. 
Step 2. 

Choose some q > p and use Chebyshev's inequality to bound the sum on the 
right by 



(35) Y.^[Mp + dP] 

i>2 

Take di = p'Mp^'', ^ becomes 



--MyM^-'^/P 



P' 



l-p-1 l-pp-1_ 

Assume for example that p > 2 : this implies 
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[Mp + dP] < AMqMl -'-p 



i>2 



For later use, we set C, = q/p — 2 and the above yields 



(36) 2i-Vf [Mp + j < AmI[I^^M;^^+OIv p-C , 

Step 3. 

We now turn our attention to the term on the left in 
Once again, we apply (1151) to obtain 



k I ra(j) 

M^p(l/miii,7r)<rti+^4-^- ^ |((l/mi)L„ - 7r)(X,. , 

3=1 \ 1 = 1 

1 /v 

Multiply by and pass to expectations : 



i/p 



<3) 



1/p 



+ 4-'=diE(myP). 

First, notice that < mi < 1 a.s. so that ¥,{mY^) < 1. Next, write 

m{j) m(j) 

J2 n{Ln - mi7r)(X,- 01 < ^ E(|(L„ - 7r)(X,, 0| + |(mi7r -7r){X,,l)\) 
1=1 1=1 

m-U) 

< J2 E|(in - 7r)(X,- 01 +E(|mi - l|)7r(Ci) 
1=1 

< J2 E|(in - 7r)(X,- 01 +E|L„(Ci) - 1|. 
1=1 

The first of these two terms is controlled using ([501 : we have 

1 2V2C diy ^ 



Y^E\{L,,-n)iX,,)\ < 4=^^||^||y2^(,)i/2+i/2. 



1=1 

And on the other hand, 



E|L„(Ci) - 1| < E|(L„ - 7r)(Ci)| + ^(Cf) 



< 



Here we have used ([50)) again. 
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We skip over details here as they are similar to those in previous proofs. Choosing 
an appropriate value for k and using the estimates above allows us to recover the 
following : 



(37) 

+ tt{CI) + 1. 

The term 7r(Cf ) is bounded by the Chebyshev inequality : 

7r(C0 < j x'^dn/di = j x^dn (^j x^di!^ ' p-^ . 

Step 4- 

Use (|36|) and ([37]), along with assumption (|3T|) : this yields 



where A„ = (t^II^IIJ ' ' , and 



E{Wp{L^, Vr)) < K{C) (p-^ + t + ^„p"/2p(l + lAr)il-"/2p(l + l/r)^ 
^l/2p 

(l-A)ra Wd^Wrj 

The remaining step is optimization in t and p. We obtain the following result : 
there exists a constant C{p, r, ^) depending only on the values of p, r, C), such that 

There is a caveat : we have used the condition p > 2 at some point, and with 
this restriction the optimization above is valid only when An < C'{p,r,Q, where 
the constant C'{p, r, () only depends on the values of p, r, 

□ 



Appendix A. Transportation inequalities for Gaussian measures on a 

Banach space 

Transportation inequalities, also called transportation-entropy inequalities, have 
been introduced by K. Marton [23| to study the phenomenon of concentration 
of measure. M. Talagrand showed that the finite-dimensional Gaussian measures 
satisfy a T2 inequality. The following appendix contains a simple extension of this 
result to the infinite-dimensional case. For much more on the topic of transportation 
inequalities, the reader may refer to the survey p4] by N. Gozlan and C. Leonard. 

For II e 'P{E), let H{.\ii) denote the relative entropy with respect to p. : 

if <C /i, and H(iy\p) — +00 otherwise. 

We say that p E Vp{E) satisfies a Tp(C) transportation inequality when 



Wp{v,p) < ^JCH{v\p) yveVp{E) 
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We identify what kind of transport inequality is satisfied by a Gaussian measure 
on a Banach space. We remind the reader of the following definition : let {E, /i) be 
a Gaussian Banach space and X ^ fi he a i?- valued r.v.. The weak variance of /i 
or X is defined by 

= sup E{f{X)). 

feE-,\f\<i 

The lemma below is optimal, as shown by the finite-dimensional case. 

Lemma A.l. Let (E,^) be a Gaussian Banach space, and let denote the weak 
variance of ^. Then ^ satisfies a T2(2(T^) inequality. 

Proof. According e.g. to [20], there exists a sequence {xi)i>i in E and an orthogaus- 
sian sequence {gi)i>i (meaning a sequence of i.i.d. standard normal variables) such 
that 

^ g^Xi ~ 
i>i 

where convergence of the series holds a.s. and in all the L^^s. In particular, the 
laws /j,„ of the partial sums X^ILi 9i^i converge weakly to /i. 

As a consequence of the stability result of Djellout-Guillin-Wu (Lemma 2.2 in 
[9]) showing that T2 is stable under weak convergence, it thus suffices to show that 
the measures fi„ all satisfy the T2(2ct^) inequality. 

First, by definition of cr, we have 

+00 

a^= sup E(^/(xOg^)' 

/GB*.I/I<1 ,= 1 

and since (gi) is an orthogaussian sequence, the sum is equal to X^S' f'^i^i)- 
Consider the mapping 

T:{R\N)^{E,\\.\\) 

n 

(ai,...,a„) y^^QiXj. 

i=l 

(here M" is equipped with the Euclidean norm N). With the remark above it is 
easy to check that ||T(a)|| < aN{a) for a E M". Consequently, T is cr-Lipschitz, and 
we can use the second stability result of Djellout-Guillin-Wu (Lemma 2.1 in |9|) : 
the push forward of a measure satisfying T2(C) by a L-Lipschitz function satisfies 
T2(L^C). As is well-known, the standard Gaussian measure 7" on M" satisfies 
T2(2) and thus T#7" satisfies T2{2a^). But it is readily checked that T#7" = 
which concludes this proof. 

□ 

Remark. M.Ledoux indicated to us another way to obtain this result. First, one 
shows that the Gaussian measure satisfies a T2(2) inequality when considering 
the cost function c = c?|^, where dn denotes the Cameron-Martin metric on E 
inherited from the scalar product on the Cameron-Martin space. This can be done 
in a number of ways, for example by tensorization of the finite-dimensional T2 
inequality for Gaussian measures or by adapting the Hamilton- Jacobi arguments 
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of Bobkov-Geiitil-Ledoux ^ in the infinite-dimensional setting. It then suffices to 
observe that this transport inequahty implies the one we are looking for since we 
have the bound d < adu (here d denotes the metric inherited from the norm of the 
Banach space). 

Let Ln denote the empirical measure associated with /i. As a consequence of 
Lemma I A. 11 we can give an inequality for the concentration of W2(Ln, around 
its mean, using results from transportation inequalities. This is acutally a simple 
case of more general results of N. Gozlan and C. Leonard ([E], p4j), we reproduce 
a proof here for convenience. 

Corollary A. 2. Let /i be as above. The following holds : 

Proof. The proof relies on the property of dimension-free tensorization of the T2 
inequality, see [13]. Since /i satisfies T2(2cr^), the product measure /z®" on the 
product space endowed with the I2 metric 



d2{{xi,. . . ,x„), (yi, . . .,yn)) = - yip + . . . + |a;„ - y^P 

also satisfies a T2(2cr^) inequality ([llj. Corollary 4.4). Therefore, it also sat- 
isfies a Ti inequality by Jensen's inequality, and this implies that we have the 
concentration inequality 

> J fdn®"" + t)< e-*'/(2'-') 

for all 1-Lipschitz functions / : (£'",^2) ^ R ([H, Theorem 1.7). For x = 
[xi, . . . ,Xn) G -E", denote = l/"X]r=i '^^j- conclude it suffices to notice 
that (xi, . . . , Xn) W2{L^, n) is y^-Lipschitz from (i?", 1^2) to R. □ 

References 

[1] M. Ajtai, J. Komlos, and G. Tusnady. On optimal matchings. Combinatorica, 4{4):259-264, 
1984. 

[2] F. Barthe and C. Bordenave. Combinatorial optimization over two random point sets, March 
2011. 

[3] S.G. Bobkov, I. Gentil, and M. Ledoux. Hypercontractivity of Hamilton-Jacobi equations. 

Journal des Mathematiques Pures et Appliques, 80(7) :669— 696, 2001. 
[4] E. Boissard and T. Le Gouic. Exact deviations in 1-wasserstein distance for empirical and 

occupation measures, March 2011. 
[5] F. BoUey, A. Guillin, and C. Villani. Quantitative concentration inequalities for empirical 

measures on non-compact spaces. Probability Theory and Related Fields, 137:541—593, 2007. 
[6] P. Cattiaux, D. Chafai, and A. Guillin. Central limit theorems for additive functionals of 

ergodic Markov diffusions processes. Arxiv preprint arXiv:1104.2198, 2011. 
[7] E. Del Barrio, E. Cine, and C. Matran. Central limit theorems for the Wasserstein distance 

between the empirical and the true distributions. Annals of Probability, 27(2): 1009-1071, 

1999. 

[8] S. Dereich, F. Fehringer, A. Matoussi, and M. Scheutzow. On the link between small ball 

probabilities and the quantization problem for Gaussian measures on Banach spaces. Journal 

of Theoretical Probability, 16(l):249-265, 2003. 
[9] H. Djellout, A. Guillin, and L. Wu. Transportation cost- information inequalities for random 

dynamical systems and diffusions. Annals of Probability, 32:2702-2732, 2004. 
[10| V. Dobric and J.E. Yukich. Exact asymptotics for transportation cost in high dimensions. J. 

Theoretical Prob, pages 97-118, 1995. 



MEAN SPEED OF CONVERGENCE IN WASSERSTEIN DISTANCE 



21 



[11| R.M. Dudley. The speed of mean Glivenko-Cantelli convergence. The Annals of Mathematical 

Statistics, 40(1):40-50, 1969. 
[12] F. Fehringer. Kodierung von Gaufimafien. 2001. 

[13] N. Gozlan and C. Leonard. A large deviation approach to some transportation cost inequal- 
ities. Probability Theory and Related Fields, 139:235-283, 2007. 

[14] N. Gozlan and C. Leonard. Transport inequalities. A survey. Markov Processes and Related 
Fields 16 (2010) 635-736, 2010. 

[15] S. Graf and H. Luschgy. Foundations of quantization for probability distributions. Springer- 
Verlag New York, Inc. Sccaucus, NJ, USA, 2000. 

[16] S. Graf, H. Luschgy, and G. Pages. Functional quantization and small ball probabilities for 
Gaussian processes. Journal of Theoretical Probability, 16{4):1047-1062, 2003. 

Jl7] J. Horowitz and R.L. Karandikar. Mean rates of convergence of empirical measures in the 
Wasserstein metric. Journal of Computational and Applied Mathematics, 55(3):261-273, 
1994. 

[18[ J. Kuelbs and W.V. Li. Metric entropy and the small ball problem for Gaussian measures. 

Journal of Functional Analysis, 116(1):133-157, 1993. 
[19[ M. Ledoux. Isoperimetry and Gaussian analysis. Lectures on probability theory and statistics, 

pages 165-294, 1996. 

[20| M. Ledoux and M. Talagrand. Probability in Banach spaces, volume 23 of Ergebnisse der 
Mathcmatik und ihrcr Grenzgebiete (3)[Results in Mathematics and Related Areas (3)|, 1991. 

[21| W.V. Li and W. Liiidc. Approximation, metric entropy and small ball estimates for Gaussian 
measures. The Annals of Probability, 27{3):1556-1578, 1999. 

[22| H. Luschgy and G. Pages. Sharp asymptotics of the functional quantization problem for 
Gaussian processes. Tiie Annals of Probability, 32 (2): 1574— 1599, 2004. 

[23] K. Marton. Bounding d-distance by informational divergence: a method to prove measure 
concentration. The Annals of Probability, 24(2):857-866, 1996. 

[24] M. Talagrand. Matching random samples in many dimensions. The Annals of Applied Prob- 
ability, 2(4):846-856, 1992. 

[25[ A.W. Van der Vaart and J. A. Wellner. Weak convergence and empirical processes. Springer 
Verlag, 1996. 

[26| V.S. Varadarajan. On the convergence of sample probability distributions. Sankhyd: The 

Indian Journal of Statistics, 19(1):23— 26, 1958. 
[27[ C. Villani. Optimal transport. OW and new. Grundlehren der Mathematischen Wissenschaften 

338. Berlin: Springer, xxii,, 2009. 

Universite Paul Sabatier 



